Compressing Relations and Indexes
نویسندگان
چکیده
We propose a new compression algorithm that is tailored to database applications It can be applied to a collection of records and is especially e ective for records with many low to medium cardinality elds and numeric elds In addition this new technique sup ports very fast decompression Promising application domains include decision sup port systems DSS since fact tables which are by far the largest tables in these applications contain many low and medium cardinality elds and typically no text elds Further our decompression rates are faster than typical disk throughputs for sequential scans in con trast gzip is slower This is important in DSS appli cations which often scan large ranges of records An important distinguishing characteristic of our algorithm in contrast to compression algorithms pro posed earlier is that we can decompress individual tu ples even individual elds rather than a full page or an entire relation at a time Also all the infor mation needed for tuple decompression resides on the same page with the tuple This means that a page can be stored in the bu er pool and used in compressed form simplifying the job of the bu er manager and improving memory utilization Our compression algorithm also improves index structures such as B trees and R trees signi cantly by reducing the number of leaf pages and compressing in dex entries which greatly increases the fan out We can also use lossy compression on the internal nodes of an index
منابع مشابه
Indexing Variation Graphs
Variation graphs, which represent genetic variation within a population, are replacing sequences as reference genomes. Path indexes are one of the most important tools for working with variation graphs. They generalize text indexes to graphs, allowing one to find the paths matching the query string. We propose using de Bruijn graphs as path indexes, compressing them by merging redundant subgrap...
متن کاملSqueezing the Most out of Relational Database Systems
With the increasing speed of CPUs relative to disks, using compression as a means of improving disk information throughput is becoming very attractive. Traditional compression algorithms such as Lempel-Ziv, which is the basis of the standard gzip compression package, are inadequate for compressing relations in a relational database system. This inadequacy is derived from two problems. The first...
متن کاملMRCSI: Compressing and Searching String Collections with Multiple References
Efficiently storing and searching collections of similar strings, such as large populations of genomes or long change histories of documents from Wikis, is a timely and challenging problem. Several recent proposals could drastically reduce space requirements by exploiting the similarity between strings in so-called referencebased compression. However, these indexes are usually not searchable an...
متن کاملEfficient Index Compression in DB2 LUW
In database systems, the cost of data storage and retrieval are important components of the total cost and response time of the system. A popular mechanism to reduce the storage footprint is by compressing the data residing in tables and indexes. Compressing indexes efficiently, while maintaining response time requirements, is known to be challenging. This is especially true when designing for ...
متن کاملبررسی کمّی تأثیر خشکسالی بر عملکرد محصول جو در آذربایجان شرقی به روش رگرسیونی چندمتغیره
The growing season climatic parameters, especially rainfall, play the main role to predict the yield production. Therefore, the main objective of this research was to find out some possible relations among meteorology parameters and drought indexes with the yield using classical statistical methods. To achieve the objective, ten meteorological parameters and twelve drought indexes were evaluate...
متن کامل